Population genetic structure of Aedes aegypti subspecies in selected geographical locations in Sudan

Although knowledge of the composition and genetic diversity of disease vectors is important for their management, this is limiting in many instances. In this study, the population structure and phylogenetic relationship of the two Aedes aegypti subspecies namely Aedes aegypti aegypti (Aaa) and Aedes aegypti formosus (Aaf) in eight geographical areas in Sudan were analyzed using seven microsatellite markers. Hardy–Weinberg Equilibrium (HWE) for the two subspecies revealed that Aaa deviated from HWE among the seven microsatellite loci, while Aaf exhibited departure in five loci and no departure in two loci (A10 and M201). The Factorial Correspondence Analysis (FCA) plots revealed that the Aaa populations from Port Sudan, Tokar, and Kassala clustered together (which is consistent with the unrooted phylogenetic tree), Aaf from Fasher and Nyala populations clustered together, and Gezira, Kadugli, and Junaynah populations also clustered together. The Bayesian cluster analysis structured the populations into two groups suggesting two genetically distinct groups (subspecies). Isolation by distance test revealed a moderate to strong significant correlation between geographical distance and genetic variations (p = 0.003, r = 0.391). The migration network created using divMigrate demonstrated that migration and gene exchange between subspecies populations appear to occur based on their geographical proximity. The genetic structure of the Ae. aegypti subspecies population and the gene flow among them, which may be interpreted as the mosquito vector's capacity for dispersal, were revealed in this study. These findings will help in the improvement of dengue epidemiology research including information on the identity of the target vector/subspecies and the arboviruses vector surveillance program.


Mosquito identification and genetic variability
The results of the identification showed that the Aedes aegypti mosquitoes from the western and southern parts of the county (Darfur and Kordofan) thus Nyala (N), Al Fasher (F), Al Junaynah (J), and Kadugli (D) were Aaf.Samples from each of the four towns located in the eastern and central parts of the county, namely Port Sudan (P), Tokar (T), Kassala (K), and Barakat/Gezira (G), were morphologically identified as the Aaa subspecies (Fig. 1).
At the seven different microsatellite loci, 202 Aedes aegypti mosquitoes from eight different sites were genotyped.Not all the loci were successfully amplified in all the examined locations, and the number of genotyped individuals per loci varied from 5 to 31 (Table 1).There was no evidence of scoring errors due to significant allele dropout or stuttering as well as no proof of null alleles presence in all loci.All the seven loci were polymorphic, albeit at varying degrees, with the number of alleles per locus ranging from 14 at locus A10 (AR = 3.4) to 37 at locus B19 (AR = 3.8) (Table 2).The average number of alleles across the seven loci in the populations ranged from 8.7 ± 4.27 in Fasher to 14.3 ± 6.18 in Kassala, with an average of 12.3 ± 2.16 alleles per locus (Table 1).
Private alleles (restricted to a single population) were observed at all loci except A10 locus and accounted for 46 of the 183 alleles (25.1%) recorded across all loci at all sites, while G11 recorded the highest number of private alleles.The greatest number of private alleles was observed at Kadugli and Nyala with 7 private alleles in both, followed by Junaynah and Gezira with 8 private alleles (Table 2).All microsatellite loci of the Ae.aegypti populations were found to be polymorphic with the average number of alleles per locus ranging from 9.25 (G11) to 20.38 (B07) (Table 2).
The number of alleles (NA), allelic range, allelic richness (AR), and gene diversity (Gd) were used to evaluate genetic diversity, and the results showed variations across loci and sites (Table 2).Although allelic richness showed variation among different sites and loci, the average AR seemed to be consistent, ranging from 3.08 in M313 to 3.79 in B07.Generally, all the sites showed a relatively high gene diversity, ranging from 0.812 to 0.915.Barakat/Gezira had the highest average gene diversity (0.949) between sites (Fig. 2).

Hardy-Weinberg equilibrium (HWE), linkage disequilibrium (LD), and F IS among the eight populations of Aedes aegypti
All loci showed significant deviations from HWE equilibrium (in one population or more) except M201 locus which followed HWE.Generally, 14 out of 56 tests (25%) significantly departed from equilibrium after Benjamini-Hochberg multiple testing correction (Table 3).Port Sudan, Kadugli and Junaynah populations showed non-significant deviation from HWE, which probably means that those populations were not following HWE (Table 3).Significant deviation from linkage disequilibrium was found in 39 of the 168 pairwise comparisons between individual loci at each site (23.2% of tests performed) (Table 4).
The inbreeding coefficient (F IS ) over all loci demonstrated that the majority of populations revealed an excess of observed heterozygotes (many negative values).A high inbreeding rate was observed within these populations (F IS average ranged from 0.021 to 0.179) (Table 3).Table 1.Number of alleles means and total number in eight populations within seven loci.P: PortSudan; Tokar: K: Kassala, G: Gezira, D: Kadugli, N: Nyala, F: Fasher, J: Junainah.www.nature.com/scientificreports/Genetic diversity, LD, HWE and F IS for the two subspecies of Ae. aegypti In the Aaa subspecies, the p-value was significant across the 7 loci indicating deviation from HWE, while linkage disequilibrium was identified in 10 out of 21 pairs (47.6%), with inbreeding factor (F IS average) = − 0.077 and moderate to low F ST value (0.023).In the Aaf subspecies, HWE demonstrated departure in all loci except A10 and M201, with linkage disequilibrium noted in 7 out of 21 pairs (33%), F ST = 0.019 and higher average inbreeding (-0.086) (Table 5).The Wilcoxon sign-rank test and mode shift test revealed no possibilities of recent population bottleneck in all the populations.All loci fit T.P.M., mutation-drift equilibrium, normal L-shaped distribution since the probability (one tail for H excess) is around 1 in all populations (non-significant p-value > 0.05) (Table 3).

Molecular variation and differentiation in Ae. aegypti populations
Hierarchical AMOVA was initially performed on the two groups of Aedes aegypti: Ae. aegypti aegypti were from Port Sudan, Tokar, Kassala and Gezira populations while Ae.aegypti formosus were from Kadugli, Nyala, Fasher and Junaynah populations.The variance components in this comparison revealed a high percentage within populations (96.02%) compared with variation among groups (2.23%) (Table 6).Both F CT (diversity between groups) estimate (F CT = 0.0224), and F SC (diversity among populations within a group) value (F SC = 0.018) were significant (p < 0.05).The isolation by distance test between all population pairs (Mantel test) was highly significant (p = 0.003) with a moderate relationship (correlation coefficient r = 0.391).Thus the correlation between geographical and genetic distance matrices advocated that landscape features may have some influence on the genetic differentiation (Fig. 3).
The Migration network using divMigrate and based on Nm estimates revealed strong gene flow between Port Sudan, Kassala and Tokar Aaa populations which are geographically located in eastern Sudan, as well as between Table 2. Summary statistics of number of individuals genotyped (N), allelic range and richness (A R ), number of alleles (N A ) and genetic diversity at each locus and each population of Ae. aegypti.N number of individuals genotyped, NA number of alleles observed in each sample with number of private alleles in parenthesis, Allelic range number of repeat units that alleles span, AR allelic richness and Gd gene diversity.www.nature.com/scientificreports/Nyala and Fasher populations (Aaf populations).The relative migration values were found to be the most between Tokar and Kassala populations and Fasher and Nyala populations (Fig. 4).STRU CTU RE analysis was then performed and according to Wright's values, the overall F ST = 0.03981 revealed moderate genetic differentiation.Generally, F ST values among Aedes aegypti samples across the eight study sites were low (0.00-0.045) (Table 7).

Number of Alleles
Allele frequency class The unrooted neighbor-joining (NJ) phylogram tree revealed two segregated groups (two main clusters), splitting the localities (Fig. 5).The first group included all three populations of Aaa thus Port Sudan, Tokar and Kassala, while the second group contained all populations of Aaf thus Kadugli, Nyala, Fasher and Junaynah in addition to Gezira whose Aaa clustered in group 2 (Fig. 6).

Grouping of populations Source of variation Sum of squares Variance components Percentage variation
Two groups according to morphological identification Group 1: P, T, K and G Group 2: D, N, F, and J    The Bayesian cluster analysis using STRU CTU RE revealed that when K = 2, the estimate of the Delta K and likelihood of the data (LnP(D)) was largest, implying two genetically separate groups (Fig. 7).After completing the first run of Structure + Structure Harvester, group 1 (red cluster) consisted of Ae. aegypti aegypti populations east of the Nile River, including Port Sudan, Tokar, Kassala, and Gezira, and group 2 (blue cluster) consisted of Ae. aegypti formosus populations Kadugli, Nyala, Fasher, and Junaynah suggesting that there are two main populations (basically, the two subspecies).A similar pattern of population clustering was further substantiated by the DAPC analysis (Fig. 8), where all the populations were seen overlapping, except Junaynah Aaf population.www.nature.com/scientificreports/However, Kadugli, Nyala and Fasher (Aaf populations) appeared to be somewhat distant from the other Aaa populations (Fig. 8).

Discussion
The threat of emerging and re-emerging arboviral infections is quickly increasing over the world, notably in Africa 29 .In Sudan, arboviral illnesses have become a major public health concern.Yellow fever, dengue fever, and chikungunya epidemics have caused substantial mortality and morbidity in different parts of the country during the last two decades, mainly in Port Sudan and Kassala in the east and Darfur in the west 2,5,30,31 .Aedes aegypti has been reported in Sudan since 1903 and was described for the first time in Khartoum by Balfour 2 .It plays a critical role in the spread of the viruses that cause these diseases 1,32 .On the African continent, two Ae.aegypti subspecies/forms known as Ae.aegypti aegypti (Aaa) and Ae.aegypti formosus (Aaf) exist and these subspecies have differences in their distribution, behaviour, breeding sites, and virus transmission capacity 1, 33 .In previous studies conducted by 1,28 , the distribution and genetic diversity of Aedes aegypti subspecies across the Sahelian belt in Sudan using the cytochrome oxidase and NADH dehydrogenase subunit 4 (ND4) mitochondrial gene markers were described.In this study, we used microsatellite markers to investigate the genetic structure and differentiation of populations of the Ae.aegypti subspecies/ forms in Sudan.
Overall, genetic diversity of Ae. aegypti estimated in this study was relatively high (NA = 14-37), (AR = 2.8-3.5),(Gd = 0.818-0.915),(HO = 0.878-0.982),(HE = 0.816-0.911)compared to other population structure studies of Ae. aegypti using microsatellites.The high alleles number range in this study reflects the vastly polymorphic nature of the selected microsatellites markers.A recent study of Ref. 3 in Aedes aegypti populations in Sudan reported a lower allelic range (7-21) compared to the allelic range reported in this study.However, other research 16,34 showed an allelic range closer to our study, thus ranging from 15 to 32 and 5 to 36 alleles, respectively.The allelic richness ranged between 4.16 and 8.67 in the study of Ref. 3 which was higher than the richness in our study (AR = 2.8-3.5) and that of our study was higher than that (1.629 -2.945) of Ref. 34 .
Generally, the populations of Aaa possessed slightly higher F ST values (F ST = 0.023) compared to Aaf populations (F ST = 0.019) which is consistent with the ND4 MtDNA dataset 28 .However, the CO1 MtDNA genetic diversity revealed contradictory results with no difference between Aaa and Aaf subspecies 1 .Significant deviations from HWE was revealed in 14 out of 56 tests (25%) and this showed a significant departure from HWE for the two subspecies.All the Aaa subspecies/forms' populations departed from HWE, while in the Aaf subspecies/ forms, there was a departure in all loci except for A10 and M201.A similar pattern was observed in a study in Senegal that detected HWE in 16 out of 56 possible tests and from those, one significant deviation was detected in the Aaf samples and five in Aaa 35 .A worldwide study discovered that HWE occurred in 42 of 300 populations of Ae. aegypti subspecies/forms populations 22 .
The inbreeding factor revealed a higher average in Aaa (F IS average = 0.086) compared to Aaf populations (F IS average = 0.077).The average allelic richness showed similarity between the two subspecies populations, which agreed with a study from Gabon and Kenya 18 .The limited linkage disequilibrium was not consistently observed for any locus pair, thus suggesting that linkage disequilibrium was not the result of physical linkage (co-segregation of alleles at loci on the same chromosome).Instead, significant results could most likely be explained by localised demographic effects.
The isolation by distance revealed a highly significant moderate correlation (p = 0.003, correlation coefficient (r) = 0.391) between the genetic diversity of the microsatellite genes across the whole populations of Ae. aegypti in this study and the geographical distance and this was concordant with 1 which used CO1 MtDNA dataset resulting in a significant moderate relationship (correlation coefficient value (r) = 0.586, p = 0.005).Another study in Sudan 3 also found correlation of genetic variations with the geographical distance between study sites in east and west of Sudan (R 2 = 0.4272, p = 0.01) which strongly supports our finding that the isolation of the subspecies was most probably by distance.
The unrooted neighbor-joining tree clustered the populations of Port Sudan, Tokar and Kassala (Aaa) in a group, while Fasher Aaf population stood alone, Gezira (Aaa) and Kadugli (Aaf) clustered together and Junaynah and Nyala (Aaf) clustered together, with the exception of Gezira, which clustered with the Aaf group, this result is similar to the study of 1 .Also, the genetic structuring of the two subspecies of Aedes aegypti is in agreement with the recent study of 3 which indicated the presence of two genetically distinct subspecies of Ae. aegypti.
The three-dimensional factorial correspondence analysis (FCA) plot demonstrated the genetic grouping among sites, Port Sudan, Tokar and Kassala (Aaa) grouped together, Fasher and Nyala (Aaf) clustered together while Gezira, Kadugli and Junaynah constituted the 3 rd group.It is worth noting that members of groups 2 and 3 were geographically related (located in the west and middle parts of the country), while members of group 1 (located in the eastern part of the country) were more diverged (high bootstrap of 95).These results might www.nature.com/scientificreports/indicate recent historical gene flow, which could be linked to the geographical distances between the different groups e.g., Port Sudan in group 1 is located only 141 km from Tokar in the same group which is located 1713 km from Junaynah in group 3.The significant relationship noticed in this study in the isolation by distance analysis correlation coefficient (r) = 0.394, p = 0.003 might justify the limited gene flow between the subspecies populations and this has been proven in the migration network which indicated the high gene flow between the geographically closed populations.AMOVA results of microsatellite genes indicated a high percentage of variance components within populations (96.02%) compared with variation among groups (2.23%).AMOVA results showed higher variation percentages among the two subspecies groups in both mitochondrial genes CO1 and ND4 (39.22% and 26.64% respectively) 1,28 with high genetic variations within populations (53.53%) and less among groups.Another study 3 indicated that the majority of genetic variation in Aedes aegypti populations from Sudan was among individuals and within regions, with just 5% of the total variation related to variations between groups, which was consistent with this study.
Interestingly, the Bayesian model-based clustering was largely congruent in partitioning the populations into two genetic groups (best structure K = 2) and clearly indicated that the two subspecies/forms populations were structured in two groups.A study conducted in Kenya and Gabon stated comparable conclusions of a first split of all the samples to two clusters (K = 2), however the STRU CTU RE results indicate the forms are clearly two, although not totally separated and this split roughly represents the strong genetic differentiation between Aaf and Aaa, as suggested in previous studies 17,19 .In the case of this study, these two genetically distinct groupings (perhaps linked to the historically documented isolation stated by 36 and matched with the geographic dispersion reported in 1 , might be attributed. Although the gene flow across subspecies populations appears low, the Migration network revealed that the gene flow within Aaa and Aaf populations seemed to be happening according to their geographical location rather than their forms/subspecies.There is a migration and moderate gene flow between Kadugli (Aaf) and Gezira (Aaa), on the other hand, a strong gene flow was found between the Aaa (Port Sudan, Kassala and Tokar) populations.These findings agreed with the study of 1 which indicated the limited gene flow mostly attributed to the geographical distances as well as different ecological environments restricts the flight range of Aedes mosquito and gene flow between the two subspecies populations.
The genetic structure of Ae. aegypti subspecies using the microsatellite markers revealed that the populations of the two subspecies were separated as two groups, especially populations of Aaf which clustered together while using different clustering methods (NJ tree, FCA plotting and STRU CTU RE).Despite the fact that mitochondrial genetic variations 1,28 revealed low gene flow and high genetic diversity between the two subspecies populations in Sudan, it is difficult to say whether this variation reflects a true difference between the two subspecies or the geographical distances that limited gene flow.
Conversely, a recent study in Gabon and Kenya found little genetic isolation between forest and domestic Ae.aegypti, implying that there may be extensive gene flow between them, while phylogenetic relationships revealed a clear separation between the two sites 18 .It is likely that gene flow between the two subspecies began lately, with the Aaf invasion into human habitat, where the Aaa already existed.Powell and Tabachnick determined from genetic data that there was complete isolation and absence of gene flow between the two subspecies around 400-550 years ago 26 .
In this investigation, the hypothesis defined using microsatellite-based estimates of genetic structure found that the two groups were genetically diverse and distinct.Overall, these findings help us better understand the forms of Ae. aegypti in East Africa, where data is scarce.The reality is that various populations have vastly varied vector competencies due to phenotypic variations.The sensitivity of Ae. aegypti aegypti to disease transmission may be connected to insect population migration and/or possible intermingling of individuals from different locations.As a result, population genetic studies require determining the genetics of these populations and investigating the genetic variations linked to vector abilities 37 .
Our research explored the genetic structure, gene flow and diversity of the two subspecies of Aedes aegypti vector populations in Sudan across different regions.These data can be utilized to track the effectiveness of control measures, changes in gene flow patterns, and new introductions.The vectorial capacity of Ae. aegypti populations and subspecies to spread arboviruses varies greatly 33,37,38 .
Bearing in mind that the two subspecies differ in their behaviour and potential to transmit disease, their distribution and existence in each arboviral outbreak area in Sudan should be considered when developing any vector control intervention 1,26,39 .Our findings will be essential to the control program's success if the nation adopts innovative vector control strategies.According to 40 , a genetic alteration that depends on enduring genetic variation in populations must be specific to the intended population.
Other future studies on vector behaviour, vector competence, breeding habitats, genetic variations and structure in other sites using higher sample sizes and study sites and viral transmission of Ae. aegypti subspecies vectors are recommended in order to improve the surveillance system of Ae. aegypti vector.
Lastly, migrations and mobility caused by humans may promote the long-distance spread of vectors, resulting in the admixture of populations adapted to urban and forest environments, which may have consequences for the management and transmission of disease.The government must designate effective preventive and control measures, increase environmental governance in the areas inhabited by both subspecies in accordance with their vectorial potential and gene flow, and implement mosquito control measures.

Mosquito sample collection and identification
Samples of Ae. aegypti larvae and pupae were collected (January 2014-April 2017) from both indoor and outdoor breeding habitats from eight study sites (Port Sudan, Tokar, Kassala, Fasher, Nyala, Gezira, Kadugli, and Junaynah) described in 1 .The study sites were selected according to the past reports of dengue and other arboviruses cases and Aedes aegypti vector records.Mosquito aquatic stages were then transferred to the insectarium at National Public Health Laboratory (NPHL) at Khartoum/Sudan where the samples were sorted out, classified, discarded to trays with water and larvae food 1 and reared to adults at optimum temperature (25 ± 2 °C) and relative humidity (80-90%) with a 12:12 (L: D) photoperiod.
Using appropriate taxonomic keys 41 , the larvae were identified morphologically to their species.After adult emergence, Ae. aegypti females were identified to their subspecies according to the morphological taxonomic key 42 .The identified female mosquitoes (Aaa and Aaf) were individually preserved in labelled microfuge tubes with 70% isopropanol and then placed in a freezer of − 20 °C.The preserved samples were transferred to the Universiti Sains Malaysia (USM) prior to proceeding with the molecular work.

Genomic DNA extraction
Aedes aegypti samples from each study site (a minimum of 10 individuals per site) were used for extraction.Prior to extraction, the mosquito samples were washed twice using ethanol and distilled water and dried out.Using DNeasy Blood and Tissue Extraction Kit (Qiagen, Germany), genomic DNA was extracted from single female mosquitoes following the manufacturer's instructions with minor adjustments (an increase of the incubation time to 65 °C overnight to increase the lyses of the cells).After extraction, genomic DNA was eluted in nuclease-free water and stored in a freezer of − 20 °C.DNA integrity was assessed and visualized on 0.8% (w/v) agarose gel electrophoresis in 0.5X TBE buffer and the quantity was further assessed using UV spectrophotometer Q3000 (Quawell).Species identification was confirmed as explained by 43 .

Microsatellite DNA molecular technique
Seven microsatellite markers (A10, B07, H08, G11, M313, M201, and B19) designed by 44 which were singlecopy microsatellite sequences identified from enriched plasmid libraries and selected cosmid subclones and have proved quite useful in evaluating the population genetics of Ae. aegypti in a number of populations 44 were selected.
Singleplex PCRs were performed using a BioRad MyCyclerTM Thermal Cycler (BioRad Laboratories, Inc.).According to the supplier's (Promega Company, USA) reaction mixture guideline, each 50 μl reaction volume contained 10 μl of 5X Green Buffer GoTaq (Promega), 3 μl of 25 mM MgCl 2 , 1 μl of 25 mM dNTP, 1 μl of each primer, 0.25 μl of Taq polymerase, 2 μl (> 50 ng) of template DNA and 31.75 μl of double distilled water.Fluorescent (two dyes) primers were used due to the further fragment analysis as shown in Table 8. www.nature.com/scientificreports/ The PCR cycling conditions were initial denaturation of 94 °C for 5 min, 30 cycles of 94 °C for 1 min, 60 °C annealing temperature for 1 min and extension at 72 °C for 2 min with a 10 min final extension at 72 °C for the marker primers A10, B07, H08 and G11.The cycling conditions of initial denaturation 94 °C for 5 min followed by 39 cycles of 94 °C for 20 s, annealing temperature of 55 °C for 20 s and an extension of 72 °C for 30 s and final extension at 72 °C for 10 min for B19, M313 and M201 marker primers.Primers, annealing temperatures, and their sequences are presented in Table 8.
The PCR products were analyzed in agarose gel electrophoresis of 2% and visualized under ultraviolet light using GelDoc-It ® TS 310 UV documentation System (Ultraviolet Products Ltd. Cambridge, UK).Samples with clear bands were sent to NHK Bioscience Solutions Sdn.Bhd for fragment analyses using Applied Biosystems 3730XL DNA Analyzer.

Standard genetic procedures and variability
Peak size of each individual microsatellite allele fragment was identified, analysed, and scored using Peak Scanner v1.0 (Applied Biosystems) with internal size standard (GS500LIZ).Samples were rescored, and amplification procedures (if possible) repeated, whenever PCR irregularities were encountered.Allele peaks in the electrophoretogram were scored according to 45 .MicroChecker v2.2.3 46 was used to identify and rectify data irregularities including typographic errors, scoring errors due to dropout of broad alleles or stutter peaks due to low DNA quality and detect and correct microsatellite null alleles.
CONVERT v1.31 47 and PGDSpider v2.1.0.3 48were used to create summary statistics for microsatellite data (allele frequencies and private allele for each locus in each population) and also used to convert the raw data so that it could be analysed in various software packages 47,48 .

HWE, LD and FIS estimations
Arlequin v3.5.2.2 49 was used for genetic variation assessment through measuring mean of both allele numbers (NA) per locus and population and allele size range.The software package FSTAT v2.9.3.2 50was used to analyse diversity among sites using allelic richness (AR) (with the rarefaction method to correct for differences in sample size).Using Arlequin v3.5.2.2 49 , we estimated observed (HO) and expected (HE) heterozygosity per locus and population, as well as mean genetic heterozygosity across all loci.Using the same program, the deviation from Hardy-Weinberg Equilibrium (HWE) was calculated based on exact testing with 10,000 Markov chain stages and 5000 dememorization steps.The likelihood ratio test of linkage disequilibrium based on the Expectation-Maximization (EM) algorithm 51 was performed on all pairwise locus comparisons for all sites in Arlequin v3.5.2.2 49 with 10,000 permutations to test for the presence of significant association between alleles among loci pairs.With 10,000 permutations, an exact test was performed to look for statistically significant deviations from independent segregation of genotypes linkage equilibrium [linkage disequilibrium (LD)], followed by the false discovery rate (FDR) adjustment 52 at the 9% significance level.The inbreeding coefficient (F IS ) was also estimated using the software program FSTAT Version 2.9.3.2 50, with a value ranging from − 1 (no inbreeding) to + 1 (high inbreeding) (total identical).
In order to determine occurrence of recent effective population size reduction, BOTTLENECK v1.2.02 53,54 was used to perform Wilcoxon sign-rank test and mode shift test (distortion of the typical L-shape distribution).Wilcoxon's test was run using the two-phased mutation model (TPM) 53,55 setting the proportion of Stepwise mutation model (SMM) in respect to TPM to 95% and the variance to 12.A total of 5000 simulation iterations was conducted, as suggested by 54 .This included 95 percent single stepwise mutation and 5% infinite allele mutation with statistical significance determined using 1000 simulations.

Genetic structure and variations
An unrooted Neighbour-Joining phylogenetic tree was created with POPTREE2 56 using Nei's genetic distance (DA) 57 and 1000 bootstrap replications to determine the confidence level of each node to visualize the relationships among sites 58 .Pairwise genetic divergence values between populations were estimated in Arlequin v3.5.2.2 49 using F ST (proportion of the total genetic variance contained in a subpopulation (the S subscript) relative to the total genetic variance (the T subscript) values.The possible values are 0 to 1.A high F ST suggests that populations differ significantly from one another, with statistical significance based on 10,000 permutations.Different hierarchical Analyses of Molecular Variance (AMOVA) to evaluate the relative attribution of variance among populations, among individuals within populations, and within individuals with 1000 random permutations was used to perform hierarchical variation structuring in Arlequin v3.5.2.2 49 .
The Mantel correlation coefficient (r) between matrices of genetic (F ST ) and geographic distance was calculated using Arlequin v3.5.2.2 49 with 10,000 random permutations to see if genetic relationships among sampling areas conformed to a pattern of genetic isolation by distance (IBD).Microsoft Excel was used to create isolation by distance charts (km).
The Factorial Correspondence Analysis (FCA) was done as a complementary approach to a univariate test like F ST since multilocus population genetic data are multivariate in nature 59 .It was employed to assess population subdivision on pairwise genetic distance among 202 individuals from eight Ae.aegypti populations.
GENETIX version 4.05 60 was used to perform FCA based on genotypic data obtained for individuals from the populations.Correction for multiple testing for HWE, LD, FIS and Wilcoxon's test was performed using the FDR approach as described in Benjamini & Hochberg (1995) at the 95% confidence level.Additionally, a clustering analysis was performed using Discriminant Analysis on Principal Components (DAPC) from Adegenet (Jombart, 2008).Furthermore, divMigrate (https:// popgen.shiny apps.io/ divMi grate-online/) was used to construct a network representing the relative rate and direction of migration among populations 61 , with Nm as the measure of genetic distance.The significance of the Nm values was determined by performing 1000 bootstraps with = 0.05.www.nature.com/scientificreports/ The hierarchical variations could be attributed to differences between groups; according to the subspecies populations, the subspecies identified in Ref. 28 as well as clustering according to K = 2 from STRU CTU RE were estimated.Three hierarchical levels of variation were tested for each run, among groups within total (F CT ), among populations within groups (F SC ) and among populations within total (F ST ).
Following that, two distinct clustering methods were employed to identify groups of genetically related individuals and sampling locations, as well as to assess their spatial distribution.First, individuals were assigned to clusters using a Bayesian model-based clustering approach performed in STRU CTU RE v2.3.3 62 .The Bayesian clustering methodology employed in Structure 2.3.4 offered a comparative assessment of population structure 62,63 .The number of clusters (K) was calculated using the web software Structure Harvester, as reported by Ref. 64 .Using the online software Structure Harvester, after performing 15 independent runs of K = 1 to 8 at 10,000 Markov Chain Monte-Carlo (MCMC) repetitions and a burn-in period of 1000 iterations, Admixture model and correlated allele frequencies were utilized, together with a uniform prior for α, with an initial value of 1 and maximum of 10.0; λ was set at 1.0.For the selected value of K, we assessed the membership coefficients per individual per cluster (Q), setting the assignment threshold to Q > 0.80.Using STRU CTU RE Harvester v0.6.94, the best number of clusters was illustrated by plotting the average estimated LnP(D) (Ln probability of the data) and the K technique of Ref. 64 .

Conclusion
While understanding the genetic variety and composition of disease vectors is crucial for managing them, this information is frequently inadequate.The two groupings that resulted from the analysis of the populations were suggested to be two genetically different groups (subspecies).Geographical distances and genetic variation showed a moderate to strong significant association.Subspecies populations appear to migrate and exchange genes based on their geographic proximity.In Sudan and other African nations, when it comes to the spread of dengue disease, chikungunya, yellow fever, and other arboviruses, research is required to comprehend the ecological factors that influence the distribution and transmission capacity of the two subspecies and to create effective viral control initiatives.

Figure 3 .
Figure 3. Unrooted neighbour-joining tree based on DA genetic distance at seven nuclear microsatellites of Ae. aegypti from eight sites in Sudan, numbers at the nodes are percentage bootstrap support from 1000 replicates.The scale bar represents 5% sequence divergence.

Figure 4 .
Figure 4. Bayesian clustering analysis generated through STRU CTU RE and STRU CTU RE HARVESTER based on eight microsatellite loci of eight Ae.aegypti populations (reduce analysis of 'pure' Group 1) to determine the exact value of K. (a) Results of assignment tests for numbers of clusters K = 2 indicated along the x-axis.(b) Mean (± SD) log posterior probabilities (c) estimate of ΔK for each value of K (putative number of populations.Each vertical line represents one individual, and y coordinates denote each individual's percentage assignment to each of the genetic clusters, represented by a different colour.Numbers from 1-8 are the study sites, 1 Port Sudan, 2 Tokar, 3 Kassala, 4 Barakat/Gezira, 5 Kadugli, 6 Nyala, 7 Fasher and 8 Junaynah.

Figure 5 .
Figure 5. Relationship between pairwise estimates of genetic distance (F ST ) and geographical distance (km) for Ae.aegypti microsatellite data.Trendline shows the general pattern of increasing genetic distance with greater geographic distance (IBD).

Figure 7 .
Figure 7. Migration network using divMigrate and based on Nm estimates.Each node represents a population.More gene flow between populations is indicated by the nodes' closeness, and the relative migration values are indicated by the arrows' strong colours.Code for the population names: AED: population from Kadugli, AEJ: population from Junaynah.AEG: population from Barakat/Gezira, AEN: population from Nyala, AEF: population from Fasher, AEP: population from Port Sudan, AEK: population from Kassala, AET: population from Tokar.

Table 3 .
Summary statistics of microsatellite data of eight populations of Ae. aegypti from Sudan.

Table 4 .
Linkage disequilibrium between pairs of microsatellite loci of Ae. aegypti populations.

Table 5 .
Summary statistics of microsatellite data in the two subspecies of Ae. aegypti from Sudan.N, number of individuals genotyped; HO, observed heterozygosity; HE, expected heterozygosity; HWE p value, test for deviation from Hardy-Weinberg Equilibrium; FIS, inbreeding coefficient; bold * indicates significance after Benjamini-Hochberg multiple testing correction at α = 0.05.